Polar codes for classical-quantum channels 

Mark M. Wilde and Saikat Guha 



Abstract — Holevo, Schumacher, and Westmoreland's coding 
theorem guarantees the existence of codes that are capacity- 
achieving for the task of sending classical data over a chan- 
nel with classical inputs and quantum outputs. Although they 
demonstrated the existence of such codes, their proof does not 
provide an explicit construction of codes for this task. The aim 
of the present paper is to fill this gap by constructing near- 
explicit "polar" codes that are capacity-achieving. The codes 
exploit the channel polarization phenomenon observed by Arikan 
for the case of classical channels. Channel polarization is an 
effect in which one can synthesize a set of channels, by "channel 
combining" and "channel splitting," in which a fraction of the 
synthesized channels are perfect for data transmission while the 
other fraction are completely useless for data transmission, with 
the good fraction equal to the capacity of the channel. The 
channel polarization effect then leads to a simple scheme for 
data transmission: send the information bits through the perfect 
channels and "frozen" bits through the useless ones. The main 
technical contributions of the present paper are threefold. First, 
we leverage several known results from the quantum information 
literature to demonstrate that the channel polarization effect 
occurs for channels with classical inputs and quantum outputs. 
We then construct linear polar codes based on this effect, and the 
encoding complexity is O(NlogN), where TV is the blocklength 
of the code. We also demonstrate that a quantum successive 
cancellation decoder works well, in the sense that the word error 
rate decays exponentially with the blocklength of the code. For 
this last result, we exploit Sen's recent "non-commutative union 
bound" that holds for a sequence of projectors applied to a 
quantum state. 

I. Introduction 

Shannon's fundamental contribution was to establish the 
capacity of a noisy channel as the highest rate at which a 
sender can reliably transmit data to a receiver (TJ. His method 
of proof exploited the probabilistic method and was thus non- 
constructive. Ever since Shannon's contribution, researchers 
have attempted to construct error-correcting codes that can 
reach the capacity of a given channel. Some of the most 
successful schemes for error correction are turbo codes and 
low-density parity-check codes [2|, with numerical results 
demonstrating that these codes perform well for a variety of 
channels. In spite of the success of these codes, there is no 
proof that they are capacity achieving for channels other than 
the erasure channel [3|. 

Recently, Arikan constructed polar codes and proved that 
they are capacity achieving for a wide variety of channels 0). 
Polar codes exploit the phenomenon of channel polarization, 
in which a simple, recursive encoding synthesizes a set of 
channels that polarize, in the sense that a fraction of them be- 
come perfect for transmission while the other fraction are corn- 
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pletely noisy and thus useless for transmission. The fraction of 
the channels that become perfect for transmission is equal to 
the capacity of the channel. In addition, the complexity of both 
the encoding and decoding scales as 0(N log N), where N 
is the blocklength of the code. Arikan developed polar codes 
after studying how the techniques of channel combining and 
channel splitting affect the rate and reliability of a channel 
0. Arikan and others have now extended the methods of 
polar coding to many different settings, including arbitrary 
discrete memoryless channels 0, source coding Q, lossy 
source coding Q, (8|, and the multiple access channel with 
two senders and one receiver |9). 

All of the above results are important for determining both 
the limits on data transmission and methods for achieving 
these limits on classical channels. The description of a classi- 
cal channel py\x arises from modeling the signaling alphabet, 
the physical transmission medium, and the receiver measure- 
ment. If we are interested in accurately evaluating and reaching 
the true data-transmission limits of the physical channels, with 
an unspecified receiver measurement, and whose information 
carriers require a quantum-mechanical description, then it 
becomes necessary to invoke the laws of quantum mechan- 
ics. Examples of such channels include deep-space optical 
channels and ultra-low-temperature quantum-noise-limited RE 
channels. Achieving the classical communication capacity for 
such (quantum) channels often requires making collective 
measurements at the receiver, an action for which no classical 
description or implementation exists. The quantum-mechanical 
approach to information theory ITOll . ifTTl is not merely a 
formality or technicality — encoding classical information with 
quantum states and decoding with collective measurements on 
the channel outputs lfT2l . fl~3l can dramatically improve data 
transmission rates, for example if the sender and receiver are 
operating in a low-power regime for a pure-loss optical chan- 
nel (which is a practically relevant regime for long haul free- 
space terrestrial and deep-space optical communication) |14|, 
JT31I . Also, encoding with entangled inputs to the channels 
can increase capacity for certain channels fl6l . a superadditive 
effect which simply does not occur for classical channels. 

The proof of one of the most important theorems of quantum 
information theory is due to Holevo [12|, Schumacher, and 
Westmoreland [13] (HSW). They showed that the Holevo 
information of a quantum channel is an achievable rate for 
classical communication over it. Their proof of the HSW 
theorem bears some similarities with Shannon's technique 
(including the use of random coding), but their main con- 
tribution was the construction of a quantum measurement at 
the receiving end that allows for reliable decoding at the 
Holevo information rate. Since the proof of the HSW theorem, 
several researchers have improved the proof's error analysis 
IfTTl . and others have demonstrated different techniques for 
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achieving the Holevo information (H, fljD, (20), EH, l22l . 
Very recently, Giovannetti <?? aZ. proved that a sequential de- 
coding approach can achieve the Holevo information [23 1. The 
sequential decoding approach has the receiver ask, through 
a series of dichotomic quantum measurements, whether the 
output of the channel was the first codeword, the second 
codeword, etc. (this approach is similar in spirit to a classical 
"jointly-typical" decoder l24l ). As long as the rate of the 
code is less than the Holevo information, then this sequential 
decoder will correctly identify the transmitted codeword with 
asymptotically negligible error probability. Sen recently sim- 
plified the error analysis of this sequential decoding approach 
(rather significantly) by introducing a "non-commutative union 
bound" in order to bound the error probability of quantum 
sequential decoding l25l . 

In spite of the large amount of effort placed on proving that 
the Holevo information is achievable, there has been relatively 
little work on devising explicit codes that approach the Holevo 
information rateQ The aim of the present paper is to fill this 
gap by generalizing the polar coding approach to quantum 
channels. In doing so, we construct the first explicit class of 
linear codes that approach the Holevo information rate with 
asymptotically small error probability. 

The main technical contributions of the present paper are 
as follows: 

1) We characterize rate with the symmetric Holevo infor- 
mation ED, US, (TO), mi and reliability with the 
fidelity EU, E2, ED, CD between channel outputs 
corresponding to different classical inputs. These pa- 
rameters generalize the symmetric Shannon capacity 
and the Bhattacharya parameter (4), respectively, to 
the quantum case. We demonstrate that the symmetric 
Holevo information and the fidelity polarize under a 
recursive channel transformation similar to Arikan's (4), 
by exploiting Arikan's proof ideas [4| and several tools 
from the quantum information literature [30], OTI . l32l . 

una, eg, urn. 

2) The second contribution of ours is the generalization 
of Arikan's successive cancellation decoder [4] to the 
quantum case. We exploit ideas from quantum hypothe- 
sis testing OH, El, ED, E3, (38) in order to construct 
the quantum successive cancellation decoder, and we use 
Sen's recent "non-commutative union bound" l25l in 
order to demonstrate that the decoder performs reliably 
in the limit of many channel uses, while achieving the 
symmetric Holevo information rate. 

The complexity of the encoding part of our polar coding 
scheme is 0(N\ogN) where N is the blocklength of the 
code (the argument for this follows directly from Arikan's 
(4)). However, we have not yet been able to show that the 

'This is likely due to the large amount of effort that the quantum 
information community has put towards quantum error correction [26], which 
is important for the task of transmitting quantum bits over a noisy quantum 
channel or for building a fault-tolerant quantum computer. Also, there might 
be a general belief that classical coding strategies would extend easily for 
sending classical information over quantum channels, but this is not the case 
given that collective measurements on channel outputs are required to achieve 
the Holevo information rate and the classical strategies do not incorporate 
these collective measurements. 



complexity of the decoding part is 0(N log N) (as is the 
case with Arikan's decoder (4)). Determining how to simplify 
the complexity of the decoding part is the subject of ongoing 
research. For now, we should regard our contribution in this 
paper as a more explicit method for achieving the Holevo 
information rate (as compared to those from prior work [12|, 

in, na, na, eqi, eed. 

One might naively think from a casual glance at our paper 
that Arikan's results [4| directly apply to our quantum scenario 
here, but this is not the case. If one were to impose single- 
symbol detection on the outputs of the quantum channels]^] 
such a procedure would induce a classical channel from input 
to output. In this case, Arikan's results do apply in that 
they can attain the Shannon capacity of this induced classical 
channel. 

However, the Shannon capacity of the best single-symbol 
detection strategy may be far below the Holevo limit [14|, 
IT31 . Attaining the Holevo information rate generally requires 
the receiver to perform collective measurements (physical 
detection of the quantum state of the entire codeword that 
may not be realizable by detecting single symbols one at a 
time). We should stress that what we are doing in this paper is 
different from a naive application of Arikan's results. First, our 
polar coding rule depends on a quantum parameter, the fidelity, 
rather than the Bhattacharya distance (a classical parameter). 
The polar coding rule is then different from Arikan's, and 
we would thus expect a larger fraction of the channels to be 
"good" channels than if one were to impose a single-symbol 
measurement and exploit Arikan's polar coding rule with the 
Bhattacharya distance. Second, the quantum measurements in 
our quantum successive cancellation decoder are collective 
measurements performed on all of the channel outputs. Were 
it not so, then our polar coding scheme would not achieve the 
Holevo information rate in general. 

We organize the rest of the paper as follows. The next 
section provides an overview of polar coding for classical- 
quantum channels (channels with classical inputs and quantum 
outputs). This overview states the main concepts and the 
important theorems, while saving their proofs for later in 
the paper. The main concepts include channel combining, 
channel splitting, channel polarization, rate of polarization, 
quantum successive cancellation decoding, and polar code 



performance. Section III gives more detail on how recursive 
channel combining and splitting lead to transformation of rate 
and reliability in the direction of polarization. Section IV 
proves that channel polarization occurs under the transforma- 
tions given in Section [III] (the proofs in Section[lV]are identical 
to Arikan's |4] because they merely exploit his martingale 
approach). We prove in Section [V] that the performance of 
the polar coding scheme is good, by analyzing the error 
probability under quantum successive cancellation decoding. 



We finally conclude in Section VI with a summary and some 
open questions. 

2 For instance, all known conventional optical receivers are single-symbol 
detectors. They detect each modulated pulse individually, followed by classical 
postprocessing. 



II. Overview of Results 

Our setting involves a classical-quantum channel W with a 
classical input x and a quantum output p x : 

W : x -> p x , 

where a; 6 {0, 1} and p x is a unit trace, positive operator called 
a density operator. We can associate a probability distribution 
and a classical label with the states po and p\ by writing the 
following classical-quantum state WW : 

p^ = i|0)<0|*®p* + i|i><i|*<s*f. 

Two important parameters for characterizing any classical- 
quantum channel are its rate and reliability^] We define the 
rate in terms of the channel's symmetric Holevo information 
I(W) where 

I(W)=I(X;B) p , 

I(X; B) p is the quantum mutual information of the state p XB , 
defined as 

I{X- B) p = H(X) P + H(B) P - H{XB) P , 

and the von Neumann entropy H(a) of any density operator 
a is defined as 

H(a) = -Tr{oTog 2 cr}. 

(Observe that the von Neumann entropy of a is equal to the 
Shannon entropy of its eigenvalues.) It is also straightforward 
to verify that 

I(W) = H((p B + pf )/2) - H(p B )/2 H(p?)/2. 

The symmetric Holevo information is non-negative by con- 
cavity of von Neumann entropy, and it can never exceed 
one if the system X is a classical binary system (as is the 
case for the classical-quantum state p XB ). Additionally, the 
symmetric Holevo information is equal to zero if there is no 
correlation between X and B. It is equal to the capacity of the 
channel W for transmitting classical bits over it if the input 
prior distribution is restricted to be uniform lfl2l . |[T3l . It also 
generalizes the symmetric capacity (4) to the quantum setting 
given above. 

We define the reliability of the channel W as the fidelity 
between the states p Q and p x |28), (29), HI: 

F{p ,pi) = IIV/WalII?' 

where ||A||i is the nuclear norm of the operator A: 

\\A\\! =Tr{\/AUl. 
Let F(W) denote the reliability of the channel W: 

F(W) = F( Po , Pl ). 

The fidelity is equal to a number between zero and one, and 
it characterizes how "close" two quantum states are to one 
another. It is equal to zero if and only if there exists a quantum 
measurement that can perfectly distinguish the states, and it 

3 We are using the same terminology as Arikan |4|. 
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Fig. 1. The channel W2 synthesized from the first level of recursion. Thick 
lines denote classical systems while thin lines denote quantum systems (this is 
our convention for the other figures as well). The depicted gate acting on the 
channel input is a classical controlled-NOT (CNOT) gate, where the filled-in 
circle acts on the source bit and the other circle acts on the target bit. Its truth 
table is (ux, U2) — > (iti © U2 , «2 ) ■ 

is equal to one if the states are indistinguishable by any mea- 
surement ifTUll , ifTTl . The fidelity generalizes the Bhattacharya 
parameter used in the classical setting |4|. Naturally, we would 
expect the channel W to be perfectly reliable if F(W) = and 
completely unreliable if F(W) = 1. The fidelity also serves 
as a coarse bound on the probability of error in discriminating 
the states p and p x (37), |[39l . 

We would expect the symmetric Holevo information 
I{W) f« 1 if and only if the channel's fidelity F(W) « 
and vice versa: I(W) « <^ F(W) ~ 1. The following 
proposition makes this intuition rigorous, and it serves as 
a generalization of Arikan's first proposition regarding the 
relationship between rate and reliability. We provide its proof 
in the appendix. 

Proposition 1: For any binary input classical-quantum 
channel of the above form, the following bounds hold 

I(W) < y/l -F(W). (2) 

A. Channel Polarization 

The channel polarization phenomenon occurs after synthe- 
sizing a set of N classical-quantum channels {Wj^ : 1 < 
i < N} from N independent copies of the classical-quantum 
channel W. The effect is known as "polarization" because a 

(i) 

fraction of the channels Wj^ become perfect for data trans- 
mission^] in the sense that I(W^) ps 1 for the channels in 
this fraction, while the channels in the complementary fraction 
become completely useless in the sense that I{W$) w in 
the limit as N becomes large. Also, the fraction of channels 
that do not exhibit polarization vanishes as N becomes large. 
One can induce the polarization effect by means of channel 
combining and channel splitting. 

1) Channel Combining: The channel combining phase 
takes copies of a classical-quantum channel W and builds from 
them an A^-fold classical-quantum channel Wn in a recursive 
way, where N is any power of two: N = 2", n > 0. The 
zeroth level of recursion merely sets W\ = W. The first level 

4 One cannot expect to transmit more than one classical bit over a perfect 
qubit channel due to Holevo's bound [27]. 
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Fig. 2. The second level of recursion in the channel combining phase. 



of recursion combines two copies of W% and produces the 
channel W%, defined as 



W 2 : uiu 2 



where 



W. 



B1B2 I 



(U U U 2 ) = Pul(Bui®Pu 



(3) 



Bo 



Figure [T] depicts this first level of recursion. 

The second level of recursion takes two copies of W2 and 
produces the channel W 4 : 



W4 : U1U2U3U4 



w, 



B1B2B3B4 



(ui,U2,U 3 ,U4,), 



(4) 



where 



W BlB2 ( Ul © u 2 ,u 3 © Ui ) ® W B3Bi {u 2 , U4), 



so that 



W BlB2B3B *( Ul ,U2,U 3 ,U 4 ) 



M 3 ffiM 4 



Figure |2] depicts the second level of recursion. 

The operation R4 in Figure [2] is a permutation that takes 
(ui,U2,u 3 ,U4,) — > (ui,u 3 ,U2,Ui). One can then readily 
check that the mapping from the row vector uf to the channel 
inputs x\ is a linear map given by x\ = u\ G4 with 
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The general recursion at the n th level is to take two copies 
of Wn/ 2 and synthesize a channel Wn from them. The first 
part is to transform the input sequence u N according to the 
following rule for all i € {1, . . . , N/2}: 



S 2 i-l = U 2 i-l © U2i 
S2i = U 2 i- 



The next part of the transformation is a "reverse shuffle" Rn 
that performs the transformation: 



(si,S 2 ,S 3 ,S4, 



> s Af-l, s n) 

-> (si,s 3 , . 



, S N ■ 



The resulting bit sequence is the input to the two copies of 

W N/2 - 

The overall transformation on the input sequence u N is a 
linear transformation given by x N — u n Gn where 



G N = B N F® n , 



(5) 



where 



F 



and B n is a permutation matrix known as a "bit-reversal" 
operation J4|. 

2) Channel Splitting: The channel splitting phase consists 
of taking the channels Wn induced by the transformation Gn 
and defining new channels from them. Let p u N denote 
the output of the channel Wn when inputting the bit sequence 



u N . We define the i tb split channel W$ as follows: 



N 



u*- 



where 



U,~ B 



Pi, 



E 



yi-l 



L )(< 



-B _ 



y J- 



(6) 



(7) 



(8) 



We can also write as an alternate notation 



U\ B 



These channels have the same interpretation as Arikan's split 
channels [4] — they are the channels induced by a "genie- 
aided" quantum successive cancellation decoder, in which the 
i th decision measurement estimates Ui given that the channel 
output p B N is available, after observing the previous bits u 1 ^ 1 
correctly, and if the distribution over uf +1 is uniform. These 
split channels arise in our analysis of the error probability for 
quantum successive cancellation decoding. 

3) Channel Polarization: Our channel polarization theorem 
below is similar to Arikan's Theorem 1 H, though ours 
applies for classical-quantum channels with binary inputs and 
quantum outputs: 

Theorem 2 (Channel Polarization): The classical-quantum 
channels wff synthesized from the channel W® N polarize, 
in the sense that the fraction of indices i € {1, . . . , N} for 
€ (1 — 8, 1] goes to the symmetric Holevo 



which I(W { ^} 



information I{W) and the fraction for which /(W 7 ^) € [0, S) 
goes to 1 — I(W) for any S € (0, 1) as N goes to infinity 
through powers of two. 

The proof of the above theorem is identical to Arikan's 
proof with a martingale approach B. For completeness, we 



provide a brief proof in Section IV 
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4) Rate of Polarization: It is important to characterize the 
speed with which the polarization phenomenon comes into 
play for the purpose of proving this paper's polar coding 
theorem. We exploit the fidelity F(W^') of the split channels 
in order to characterize the rate of polarization: 



Ul- % B N U{- 1 B> 



(9) 



The theorem below exploits the exponential convergence 
results of Arikan and Telatar |40|, which improved upon 
Arikan's original convergence results [4| (note that we could 
also use the more general results in Ref. ATI '): 

Theorem 3 (Rate of Polarization): Given any classical- 
quantum channel W with I(W) > 0, any R < I(W), 
and any constant f3 < 1/2, there exists a sequence of sets 
A N C {1, ... ,N} with \A N \ > NR such that 



£ = 0(2""'). 



Conversely, suppose that R > and f3 > 1/2. Then for any 
sequence of sets An C {1,...,N} with \An\ > NR, the 
following result holds 



max 



F(W, 



(0> 



y ):*eA N }=Lj(2~ Nl3 ). 



The proof of this theorem exploits our results in Section III 
and Theorem 1 of Ref. BOl . 

B. Polar Coding 

The idea behind polar coding is to exploit the polarization 
effect for the construction of a capacity-achieving code. The 
sender should transmit the information bits only through 

(i) 

the split channels W N for which the reliability parameter 
F(WjZ') is close to zero. In doing so, the sender and receiver 
can achieve the symmetric Holevo information I(W) of the 
channel W. 

1) Coset Codes: Polar codes arise from a special class of 
codes that Arikan calls "GAr-coset codes" 0]. These Gn~ 
coset codes are given by the following mapping from the input 
sequence u N to the channel input sequence x N : 



c N = u N G 



N ■ 



where Gjv is the encoding matrix defined in Q. Suppose that 
A is some subset of {1, ... , N}. Then we can write the above 
transformation as follows: 



.N 



u A G N {A) © u A .G N (A c ), 



(10) 



where Gn (A) denotes the submatrix of Gn constructed from 
the rows of Gn with indices in A and denotes vector binary 
addition. 

Suppose that we fix the set A and the bit sequence . The 
mapping in 

This mapping 



10 1 then specifies a transformation from the bit 
sequence u A to the channel input sequence x N 



is equivalent to a linear encoding for a code that Arikan calls 
a GAr-coset code where the sequence u A cGn(A c ) identifies 
the coset. We can fully specify a coset code by the parameter 
vector (N, K, A, u A c ) where N is the length of the code, K = 



\A\ is the number of information bits, A is a set that identifies 
the indices for the information bits, and ua c is the vector of 
frozen bits. The polar coding rule specifies a way to choose 
the indices for the information bits based on the channel over 
which the sender is transmitting data. 

2) A Quantum Successive Cancellation Decoder: The spec- 
ification of the quantum successive cancellation decoder is 
what mainly distinguishes Arikan's polar codes for classical 
channels from ours developed here for classical-quantum 
channels. Let us begin with a GAr-coset code with parameter 
vector (N, K, A, u A c). The sender encodes the information bit 
vector ua along with the frozen vector uj^ according to the 
transformation in ( fT0| ). The sender then transmits the encoded 
sequence x N through the classical-quantum channel, leading 
to a state p Xl <8>- ■ '®Px N , which is equivalent to a state p u N up 
to the transformation Gat. It is then the goal of the receiver to 
perform a sequence of quantum measurements on the state p u N 
in order to determine the bit sequence u N . We are assuming 
that the receiver has full knowledge of the frozen vector 
so that he does not make mistakes when decoding these bits. 

Corresponding to the split channels Wjp in are the 
following projectors that can attempt to decide whether the 
input of the i th split channel is zero or one: 



n 



(-0,0 



/ ur'B* / or'A" . „ I 



n 



(O.i 



(0,o 



/ U\- 1 B N I Ut 1 B N „ 1 



where \J A denotes the square root of a positive operator A, 
{B > 0} denotes the projector onto the positive eigenspace of 
a Hermitian operator B, and {B < 0} denotes the projection 
onto the negative eigenspace of B. After some calculations, 
we can readily see that 



n 



ui^B* 
(0,o 



n 



[/*- B _ 



(0,i 



ttB 

n (o,«rv 



n 



(0, 



(ii) 

(12) 



where 



= {V^-V^°}. 
■C«.-{^-^<o}- 

The above observations lead to a method for a successive 
cancellation decoder similar to Arikan's [4|, with the following 
decoding rule: 

it, if i E A c 
h^- 1 ) ifieA ' 

where h(u\~ ) is the outcome of the following i th measure- 
ment on the output of the channel (after i — 1 measurements 
have already been performed): 



/tt b " TT b ™ 1 
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We are assuming that the measurement device outputs "0" if 
the outcome IT~ .*-i n occurs and it outputs "1" otherwise. 

(Note that we can set ITf , „,_, = I if the bit u; is a frozen 
bit.) The above sequence of measurements for the whole bit 
stream u N corresponds to a positive operator-valued measure 
(POVM) {A n «} where 



A, 



• n 



{i).u\ 1 l 



n 



• n 



(i),u\ x i 



• n 



The above decoding strategy is suboptimal in two regards. 
First, the decoder assumes that the future bits are unknown 
(and random) even if the receiver has full knowledge of 
the future frozen bits (this suboptimality is similar to the 
suboptimality of Arikan's decoder |4| ). Second, the measure- 
ment operators for making a decision are suboptimal as well 
because we choose them to be projectors onto the positive 
eigenspace of the difference of the square roots of two density 
operators. The optimal bitwise decision rule is to choose these 
operators to be the Helstrom-Holevo projector onto the positive 
eigenspace of the difference of two density operators [34|, 
||35l . Having our quantum successive cancellation decoder 
operate in these two different suboptimal ways allows for 
us to analyze its performance easily (though, note that we 
could just as well have used Helstrom-Holevo measurements 
to obtain bounds on the error probability). This suboptimality 
is asymptotically negligible because the symmetric Holevo 
information is still an achievable rate for data transmission 
even for the above choice of measurement operators. 

3) Polar Code Performance: The probability of error 
P e (N, K, A, ua") for code length N, number K of informa- 
tion bits, set A of information bits, and choice uj^ for the 
frozen bits is as follows: 



P e (N, K,A,u A c) 



1 ~ ^kJ2 Ti ^ AunPun ^ 



= 1 - 



U A 



B B B B 

■ ■ n (i), ui Pu« n (i) :iI1 • • • n^xr 1 ", ' ' ' n (jv),«f -v, 



where we are assuming a particular choice of the bits 

R N ft N 

in the sequence of projectors H? ^ ■•• 

R N 1 1 

• • • Hf] ^ and the convention mentioned before that 

(1):«1 

R 

nf , 4 _i —I if Ui is a frozen bit. We are also assuming that 
the sender transmits the information sequence ua with uni- 
form probability 2~ A . The probability of error P e (N,K,A) 



averaged over all choices of the frozen bits is then 

P e {N, K,A) 



U A a 



-i • • • ll, ., «-i 

UN { 1 )> U 1 u i 



B B B 

■ ■ ■ n (i), Ul Pu» n (1)iUi • • • n (i)jttJ _ ltt( ■ ■ • n 



(N),u^- 1 u r , 

(13) 

One of the main contributions of this paper is the following 
proposition regarding the average ensemble performance of 
polar codes with a quantum successive cancellation decoder: 

Proposition 4: For any classical-quantum channel W with 
binary inputs and quantum outputs and any choice of 
(AT, K, A), the following bound holds 



P e (N,K,A)<2^yF(W^). 

Thus, there exists a frozen vector u^c for each (AT, K, A) such 
that 

v -.a , < 2A^v5(wf). 

4) Polar Coding Theorem: Proposition|4]immediately leads 
to the definition of polar codes for classical-quantum channels: 

Definition 5 (Polar Code): A polar code for W is a Gjv- 
coset code with parameters (N, K, A, u^.") where the infor- 
mation set A is such that \A\ = K and 

F(W$) < F{wjj } ) for alH € A and j e A c . 

We can finally state the polar coding theorem for classical- 
quantum channels. Consider a classical-quantum channel W 
and a real number R > 0. Let 

P e (N,R) =P e (N, LA^RJ,.4), 

with the information bit set chosen according to the polar 
coding rule in Definition [5] So P e (N,A) is the block error 
probability for polar coding over W with blocklength N, rate 
R, and quantum successive cancellation decoding averaged 
uniformly over the frozen bits u^. 

Theorem 6 (Polar Coding Theorem): For any classical- 
quantum channel W with binary inputs and quantum outputs, 
a fixed R < I(W), and fj < 1/2, the block error probability 
P e (N,R) satisfies the following bound: 

P e (N,R)= o{2-i N "). 

The polar coding theorem above follows as a straightfor- 
ward corollary of Theorem [3] and Proposition |4] 

III. Recursive Channel Transformations 

This section delves into more detail regarding recursive 
channel combining and channel splitting. Recall the channel 
combining in (|3]|5j and the channel splitting in These 
allowed for us to take A^ independent copies of a classical- 
quantum channel W® N and transform them into the N split 
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Ml- 



I/: 



w 



As, 

Px 2 



Thus, from the above, we can write (W, W) 
(Wj 1 ^, Wj 2 ^) because, by the definition in ([(jj, we have 



Pu 2 ' 



C/l 



< 2 - 



We can actually write more generally 




Fig. 3. The channels W~ and W + induced from channel combining and 
channel splitting. The channel W~ with input ui is induced by selecting the 
bit U2 uniformly at random, passing both U\ and U2 through the encoder, 
and then through the two channel uses. The channel W + with input 112 is 
induced by selecting U\ uniformly at random, copying it to another bit (via 
the classical CNOT gate), sending both U± and U2 through the encoder, and 
the outputs are the quantum outputs and the bit U\. 



channels Wj^ , wj^\ We show here how to break 
the channel transformation into a series of single-step trans- 
formations. Much of the discussion here parallels Arikan's 
discussion in Sections II and III of Ref. J4). 

We obtain a pair of channels W~ and W + from two 
independent copies of a channel W : x — > p x by a single- 
step transformation if it holds that 



W : ux -> p ut , 



where 



Put 



1 

^2 



Pu 2 ®U t ® Pu2 



Also, it should hold that 



W + :u a ->p+ a , 



where 



4 = E*M< 



UX\ U1 ® Pulmu,® Put 



(15) 



1^2 

\ Ml / 
We use the following notation to denote such a transformation: 

(W,W) -> (w _ ,w + ). 

Additionally, we choose the notation and so that 
W~ denotes the worse channel and W + denotes the better 
channel. Figure [3] depicts the channels W" and W + . 



(w { ;\wtf) ^ (w 2N 



(2t-l) 



II 



(2i)N 



2A ! 



(16) 



which follows as a corollary to 

Proposition 7: For any n > 0, N — 2 n , and 1 < i < iV, it 
holds that 



r(2i-l). 
' 2N * 



(17) 

W^\u2i) = W%\un-i © u 2i ) ® ^(ujm), (18) 

with W$ defined in (SJ. 

Proof: The proof of the above proposition is similar to 
the proof of Arikan's Proposition 3 0. ■ 
We can justify the relationship in ( [T6| by observing that 
( fTT] ) and ( p"8j ) have the same form as and (|T3J with the 
following substitutions: 

u 2 <- U 2 i- 



A. Transformation of Rate and Reliability 

This section considers how both the rate I(W^ ) and 
reliability F(W^ ) evolve under the general transformation 



(14) in (16 1. All proofs of the results in this section appear in the 



appendix. 

Proposition8: Suppose that (W,W) -> (W~ ,W + ) for 
some channels satisfying ( 14|15) , Then the following rate 
conservation and polarizing relations hold 



I(W~ 



J(W H 
I(W~ 



= 2I(W), 

< i{w + ) 



(19) 
(20) 



We can conclude from the above two relations that 



I(W~) < I(W) < I(W + ). 

The following proposition states how the reliability evolves 
under the channel transformation: 

Proposition 9: Suppose (W, W) — > (W~,W + ) for some 
channels satisfying ( [T4p3] l. Then 

y/F(W+) = F(W), (21) 
y/F(W~) <2^/F(W)-F(W), (22) 
F(W~) > F(W) > F(W + ). (23) 
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By combining pTj ) with ( |22] >, we observe that the reliability 
only improves under a single-step transformation: 

y/F(W-) + V F (W+) < 2y/F(W). 

The above propositions for the single-step transformation 
lead us to the following proposition in the general case: 

Proposition 10: For any classical-quantum channel W, 
N = 2™, n > 0, and 1 < i < N, the local transformation in 
( fl6] l preserves rate and improves reliability in the following 
sense: 



I(W%-V)+I(WW)=2I(W$), 



2N 



2N 
(2ik 



(24) 
(25) 



Channel splitting moves rate and reliability "away from the 
center": 

i(w%-V)<i(w$)<i(wW), 



The reliability terms satisfy 



F{W^ 



N )i 



(26) 



F(W$ l) ) < 2Jf(W$) - F(W { ^), (27) 



and the cumulative rate and reliability satisfy 



jv 



Y,I(W$ ) ) = NI(W), 

i=l 

N , 

Y,\/F(W^)<N^/F(W)- 



(28) 



(29) 



The above proposition follows directly from Propositions [7] 
8] a nd [9] The relations in ( 28 i and ( |29] > follow from applying 
d24l and p5]l repeatedly. 



IV. Channel Polarization 

We are now in a position to prove Theorem [2] on channel 
polarization. The idea behind the proof of this theorem is 
identical to Arikan's proof of his Theorem 1 in Ref. [4| — with 
the relationships in Propositions [8] and [9] already established, 
we can readily exploit the martingale proof technique. Thus, 
we only provide a brief summary of the proof of Theorem [2] 
by following the presentation in Chapter 2 of Ref. 0. 

Consider the channel W$ ■ Let b\ ■ ■ ■ b n denote an n-bit 
binary expansion of the channel index i and let W^. 



(i) 

Wjy . Then we can construct the channel ..&,.) by com- 
bining two copies of W^ 1 ...b k _ 1 ) according to (17i if bk = 
or by combining two copies of W(b 1 ...t k _ 1 ) according to (fT8|) 
if bk — 1. We repeatedly construct all the way from b\ until 
b n with the above rule. 

Arikan's idea was to represent the channel construction as a 
random birth process in order to analyze its limiting behavior. 
In order to do so, we let {B n : n > 1} be a sequence of IID 
uniform Bernoulli random variables, where we define each 
over a probability space (ft,F, P). Let Fq denote the trivial 
(T-field. Also, let {F n : n > 1} denote the er-fields that the 



■bn) 



random variables (Si, . . . , B n ) generate. We also assume that 
F C T x C • • • C F n . Let 1^0 = 1^ and let {W„ : n > 0} 
denote a sequence of operator-valued random variables that 
forms a tree process where W n +i is constructed from two 
copies of W n according to §FJ\ if B n = and according 
to (18 1 if B n = 1. The output space of the operator- valued 
random variable W n is equal to {W^^jf^. We are not really 
concerned with the channel process {W n : n > 0} but more 
so with the fidelities {F(wj^)} and Holevo informations 
{I(wj^)}. Thus, we can simply analyze the limiting behavior 
of the two random processes {F n : n > 0} = {y^F(W n ) : 
n > 0} and {/„ : n > 0} ee {I(W n ) : n > 0}. By the 
definitions of the random variables F„ and /„, it follows that 



Pr{/ n e(a,6)} = i|{i 
Pr{F n e(a,6)} = i|{i 



I(W$)e(a,b)}\, 
F(W$)e(a,b)}\. 



We then have the following lemma. 

Lemma 11: The sequence {(F ni F n ) : n > 0} is abounded 
super-martingale, and the sequence {(I n ,F n ) : n > 0} is a 
bounded martingale. 

Proof: Let b±- ■ - b n be a particular realization of the 
random sequence B\ ■ ■ ■ B„ . Then the conditional expectation 
satisfies 



E{I n+1 | B 1 =b l ,...,B n = b n ) 

= i(w ibu ... M ) 



= /», 

where the second equality follows from the definition of 
W(bi,...,b„,Q) an d W(6 1 ,...,b„,i) and Proposition [TO] The proof 



for {F n } similarly follows from the definitions and Propo- 
sition [10] The boundedness condition follows because < 
I(W), F(W) < 1 for any classical-quantum channel W with 
binary inputs and quantum outputs. ■ 
We can now finally prove Theorem [2] regarding channel 
polarization. Given that {/„} is a bounded martingale and 
{F n } is a bounded super-martingale, the limits lim JWOO /„ 
and linin^oo F n converge almost surely and in L\ to the 
random variables 1^ and F^. The convergence implies that 
E{|F„ + i — F n \} — > as n — > oo. By the definition of the 
process {F n }, it holds that F n+1 — F% with probability |, so 
that 

E{|F„ +1 - F n \} > ^E{\F n (l - F n )\} > 0. 

It then follows that E{|F„(1 - F n )\} -> as n oo, which 
in turn implies that £{^00(1 — Fqo)!} = 0. We conclude 
that Fao £ {0, 1} almost surely. Combining this result with 
Proposition [TJ proves that /oo g {0,1} almost surely. Finally, 
we have that Prj/^ = 1} = £{1^} = E{I Q } = I(W) 
because I n is a martingale. 

V. Performance of Polar Coding 

We can now analyze the performance under the above 
successive cancellation decoding scheme and provide a proof 
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of Proposition |4] The proof of Theorem [6] readily follows by 
applying Proposition |4] and Theorem [3] 

First recall the following "non-commutative union bound" 
of Sen (Lemma 3 in Ref. G5l ): 



1 - Tr{ITv ■ • • niplli • ■ ■ T1 N } < 2, 



N 



. £Tr{(J-U,M, 
\ i=i 



(30) 

which holds for projectors Eli, . . . , II ^ and a density operator 
pQWe begin by applying the above inequality to P e (N, K, A) 
(defined in (13)): 



P e (N, K, A) 

= 2N E ( 1 ~ Tr l n w,«f -^jv ' ' ' n w,« t r 1 «» ■ ■ ■ 

B w i5 w B N B N 

■ ■ ■ n (l),«! Pu» U (1), U1 ■ ■ ■ n ( l ),„- 1 «, ' ' ' n (JV),«f 



u« \ i=l 



N 



E T '{( J - n ?"<-'»>-} 



j« V ie.4 



V « N ieA 

where the second equality follows from our convention that 

ri N 

(i) u i ~ 1 u ~ ^ ^ Ui * s a f rozen bit and the second inequality 
follows from concavity of the square root. Continuing, we have 



ieA u N 



\ 



E E E 2 E ^^fCr 1 ^"''} 



\ 



where we define 

nf. = /-nf, . 

The first equality follows from exchanging the sums. The 
second equality follows from expanding the sum and normal- 
ization J2u N T ne third equality follows from bringing the 



We say that Sen's bound is a "non-commutative union bound" because 
it is analogous to the following union bound from probability theory: 
Pr{(Aj n ■ ■ ■ n A N ) C } = Pr{Af U ■ ■ ■ U A C N } < ££i Pr{A=}, where 
Ai, ■ j4jv are events. The analogous bound for projector logic would 
be Tr{(7-ni---njv-ni)p} < E l J Ii Tr {U - n,)p}, if we think of 
IIi ■ ■ ■ rijv as a projector onto the intersection of subspaces. Though, the 
above bound only holds if the projectors TIi , . . . , II jv are commuting (choos- 
ing IIi = |+)(+|, IT2 = |0}(0|, and p = |0}(0| gives a counterexample). If 
the projectors are non-commuting, then Sen's bound in (30| is the next best 
thing and suffices for our purposes here. 



sum J2u N inside the trace. Continuing, 



\ 



i£A u i~i 



= 2 



i£A Ui 



2fSE5Tr{ ( /-5:i«r 1 ><«r 1 l 



(i) ,u l 1 ~ 1 Ui ' 



E^K 1 )^ 



i— 1\ / ,i— 1 1 t/f -1 ^ —B 



—B \ 



The first equality is from the definition in ([8]). The second 
equality is from exchanging sums. The third equality is from 
the fact that 



^2p(x)Ti{A xPx } = 

X 

Tti^2\x){x\®A x 



^p{x')\x')(x'\®p x , 



Continuing, 



i&A «; 



<VE^ 

V ieA 



The first equality is from the observations in ( fTTfL?) and the 
definition in ([6]). The final inequality follows from Lemma 3.2 
of Ref. 11371 and the definition in (|9). This completes the proof 
of Proposition |4] 

We state the proof of Theorem [6] for completeness. Invoking 
Theorem [3] there exists a sequence of sets An with size 
\A N \ > NR for any R < I(W) and p < 1/2 such that 



E V^ (l 



0(2 



-N 



ieAj. 



and thus 



VE \4F~^)=o { 2^). 
V ieA N 

This bound holds if we choose the set An according to the 
polar coding rule because this rule minimizes the above sum 
by definition. Theorem [6] follows by combining Proposition [4] 
with this fact about the polar coding rule. 

VI. Conclusion 

We have shown how to construct polar codes for channels 
with classical binary inputs and quantum outputs, and we 
showed that they can achieve the symmetric Holevo informa- 
tion rate for classical communication. In fact, for a quantum 
channel with binary pure state outputs, such as a binary- 
phase-shift-keyed (BPSK) coherent-state optical communica- 
tion alphabet, the symmetric Holevo information rate is the 



to 



ultimate channel capacity lfl5ll . which is therefore achieved by 
our polar code [42]. The general idea behind the construction 
is similar to Arikan's |4j, but we required several technical 
advances in order to demonstrate both channel polarization 
at the symmetric Holevo information rate and the operation 
of the quantum successive cancellation decoder. To prove 
that channel polarization takes hold, we could exploit several 
results in the quantum information literature QUI , (31], |32|, 
ifTOl . Il33l . ifTTI and some of Arikan's tools. To prove that 
the quantum successive cancellation decoder works well, we 
exploited some ideas from quantum hypothesis testing [34|, 
[ 35 1 , 1361 . l37l . |38| and Sen's recent "non-commutative union 
bound" ||25ll . The result is a near-explicit code construction that 
achieves the symmetric Holevo information rate for channels 
with classical inputs and quantum outputs. (When we say 
"near-explicit," we mean that it still remains open in the 
quantum case to determine which synthesized channels are 
good or bad.) Also, several works have now appeared on 
polar coding for private classical communication and quantum 
communication flU, ED, ED, ED, ESI, ED, most of which 
use the results developed in this paper. 

One of the main open problems going forward from here 
is to simplify the quantum successive cancellation decoder. 
Arikan could show how to calculate later estimates by exploit- 
ing the results of earlier estimates in an "FFT-like" fashion, 
and this observation reduced the complexity of the decoding 
to 0(N\ogN). It is not clear to us yet how to reduce the 
complexity of the quantum successive cancellation decoder 
because it is not merely a matter of computing formulas, but 
rather a sequence of physical operations (measurements) that 
the receiver needs to perform on the channel output systems. 
If there were some way to perform the measurements on 
smaller systems and then adaptively perform other measure- 
ments based on earlier results, then this would be helpful in 
demonstrating a reduced complexity. 

Another important open question is to devise an efficient 
construction of the polar codes, something that remains an 
open problem even for classical polar codes. However, there 
has been recent work on efficient suboptimal classical polar 
code constructions |48 1, which one might try to extend to polar 
codes for the classical-quantum channel. Finally, extending our 
code and decoder construction to a classical-quantum channel 
with a non-binary (M-ary) alphabet remains a good open line 
for investigation. 



(Proposition 1 of Ref. 1321 ). In particular, Holevo proved that 
the following inequality holds for all s € [0,1]: 

I{X;B) U >- log Tr| p x (x) (u x )^\ 1+s j , 

where the entropy on the LHS is with respect to a classical- 
quantum state 



XB 



x\ x ® . 



By setting s = 1, the alphabet X = {0, 1}, and the distribution 
Px(x) to be uniform, we obtain the bound 



I(W) > - log(Tr{ QU/PO + VPI) 



!og( ^Tr{p + Vpi^Po + V 'PoVpT + Pi} 




log( -(i+tKVpoVpT}) 



1 + TrjVAn/pT} J 

- ) 

where the last line follows from 

T*{VpoVpi} < Tr{|VAn/pI|} 

= llv^O\/Pilli 

= y/F(W). 

The other inequality in (j2j) follows from (21) in Ref. 
In particular, they showed that 



i(w)<h 2 ( -{l- vnw)) )> 



-x log 2 x 



where the binary entropy H 2 (x) = 
(1 — x) log 2 (l — x). Combining this with the following 
observation that holds for all < F(W) < 1 gives the second 
inequality: 

ffaQ(l-V^W))) < \A - F(W). 
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Appendix 

Proof of Proposition |7J The first bound in ([T]i follows 
from Holevo's characterization of the quantum cutoff rate 



Proof of Proposition [§]• These follow from the same line 
of reasoning as in the proof of Arikan's Proposition 4 [4|. We 
prove the first equality. Consider the mutual information 

I(U 1 U 2 ;B 1 B 2 ) = I(X 1 X 2 ;B 1 B 2 ) 

= I(X 1 ;B 1 )+I(X 2 ;B 2 ) 
= 2I(W). 

By the chain rule for quantum mutual information [11], we 
have 

I{UxU 2 ; Bifl a ) = I^B^) + I(U 2 ; B^U,) 
= I(W~) +I(W + ). 
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The inequality follows because Let A m denote the POVM that achieves the minimum for 

\/F(WY 

l(W+)=I(U 2 ;B 1 B 2 U 1 ) V 

= I(U 2 ;B 2 ) +I(U 2 ;B 1 U 1 \B 2 ) V F (W) = y/F{p , Pl ) 

= I(W) + I(U 2 ; B 1 U 1 \B 2 ). = min ^ v/Tr{A m p }Tr{A mPl }. 

m 

j/rr^-v Then the POVM {A; (g) A m } is a particular POVM that can 

^ ' — try to distinguish the states p^ and pj7 We then have 



because I{U 2 ;B x Ux\B 2 ) > |30|, EE), flj]. We then have 



y/F(W- 



2I[W+) 1 2 i{w-) + i(w+) - £ V Tr {( A '® A »)(^)} Tr {( A '® A ™)^)} 



and the inequality follows. 




ConsfdTrlaf ra ^' Y/0 " S ^ ^ ^ ^ ^ X f r { (A < ® Am) 2 ^ + ^ ® ^) } 

= y/F(pt,pt) = [(Tr{A^}Tr{A ro ^} + 



\ 



F U ® Y) Ul ® ® *l ® ^ ) Tr{A ;P f 1 }Tr{A m pf 2 }') (ti{a iP b 1 }Tr{A m ^ 2 } + 

\ Ul 111 / ' ^ 

-f{p¥^f) Tr{A iPo ^}Tr{A m ^} 



Making the assignments 

«m = Tr|A m p^ 2 |, 

= i? (/ , o ! Pi) „ Bl \ 

= J F(W) I J 

The first two equalities follow by definition. The third equal- ^ — ^l 7 ^ 1 }' 

ity follows from the multiplicativity of fidelity under tensor g = Tr{A pf 2 \ 

product states ifTUl. ifTTl: I m J' 

the above expression is equal to 

F( P i%)(t,t®u) = F( P ,t)F(<j,lj). 

The fourth equality follows from the following formula that ^ ^ ^ l0Lm + ^"VOW^ + M« 

holds for the fidelity of classical-quantum states: l ' m 

We can then exploit Arikan's inequality in Appendix D of 
_^ _^ Ref. H to have 

\ F[ ^2p( x )\ x )( x \ ® Px,^2p(x)\x){x\ ® a x I i 

\ \ X X 

We now consider the second inequality. The fidelity also has ' m 
the following characterization as the minimum Bhattacharya — yp^Om/yAn 

overlap between distributions induced by a POVM on the ; >m 

states EB, Gl, ED: i / ^ , x 

F(p ,Pi) = min ^ v / ' rr { A mPo}Tr{A m p 1 } 2 . ' m 

{Am} \m / - 2^ V PHI 2^ V 

So 1 m 

y/F(W~) = min V Jtt{t^ iB 'po }tv{t^ B2 P A * m « 
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The inequality F(W ) > F(W) follows from concavity 
of fidelity and its multiplicativity under tensor products [10|, 
CD: 

F(W~) =F(pv,pi) 

> \f(p^®P^,P^®P% 2 ) 

+ \F(pi 1 ®Pi\Pv 1 ®pf 2 ) 
= \f(po 1 ,P? 1 )f(po 2 ,Po 2 ) 

+ \f{p b 1 \p^)f{ p b 1 ^ p ^) 
= If(p^,p? 1 ) + If( p ^, p ^) 

= F{W) 

The inequality F(W) > F(W + ) follows from the relation 
■ S /F(W+) = F{W) and the fact that < F < 1. ■ 
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